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Abstract 

Animals living in groups make movement decisions 
that depend, among other factors, on social inter- 
actions with other group members. Our present 
understanding of social rules in animal collectives 
is mainly based on empirical fits to observations, 
with less emphasis in obtaining first-principles ap- 
proaches that allow their derivation. Here we show 
that patterns of collective decisions can be derived 
from the basic ability of animals to make proba- 
bilistic estimations in the presence of uncertainty. 
We build a decision-making model with two stages: 
Bayesian estimation and probabilistic matching. In 
the first stage, each animal makes a Bayesian esti- 
mation of which behavior is best to perform taking 
into account personal information about the envi- 
ronment and social information collected by observ- 
ing the behaviors of other animals. In the proba- 
bility matching stage, each animal chooses a be- 
havior with a probability equal to the Bayesian- 
estimated probability that this behavior is the most 
appropriate one. This model derives very simple 
rules of interaction in animal collectives that de- 
pend only on two types of reliability parameters, 
one that each animal assigns to the other animals 
and another given by the quality of the non-social 
information. We test our model by obtaining the- 
oretically a rich set of observed collective patterns 
of decisions in three-spined sticklebacks, Gasteros- 
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teus aculeatus, a shoaling fish species. The quanti- 
tative link shown between probabilistic estimation 
and collective rules of behavior allows a better con- 
tact with other fields such as foraging, mate selec- 
tion, neurobiology and psychology, and gives predic- 
tions for experiments directly testing the relation- 
ship between estimation and collective behavior. 



Author Summary 



Animals need to act on uncertain data and with lim- 
ited cognitive abilities to survive. It is well known 
that our sensory and sensorimotor processing uses 
probabilistic estimation as a means to counteract 
these limitations. Indeed, the way animals learn, 
forage or select mates is well explained by proba- 
bilistic estimation. Social animals have an inter- 
esting new opportunity since the behavior of other 
members of the group provides a continuous fiow of 
indirect information about the environment. This 
information can be used to improve their estima- 
tions of environmental factors. Here we show that 
this simple idea can derive basic interaction rules 
that animals use for decisions in social contexts. 
In particular, we show that the patterns of choice 
of Gasterosteus aculeatus correspond very well to 
probabilistic estimation using the social informa- 
tion. The link found between estimation and col- 
lective behavior should help to design experiments 
of collective behavior testing for the importance of 
estimation as a basic property of how brains work. 
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Introduction 

Animals need to make decisions without certainty 
in which option is best. This uncertainty is due to 
the ambiguity of sensory data but also to limited 
processing capabilities, and is an intrinsic and gen- 
eral property of the representation that animals can 
build about the world. A general way to make deci- 
sions in uncertain situations is to make probabilistic 
estimations There is evidence that animals 

use probabilistic estimations, for example in the 
early stages of sensory perception [3UTl]. sensory- 
motor transformations [T2HI4] , learning [T5l - [T7] and 
behaviors in an ecological context such as strategies 
for food patch exploitation [TM^U] and mate selec- 
tion [H], among others [HHZHHllig . 

An additional source of information about the en- 
vironment may come from the behavior of other an- 
imals (social information) [53H1E1- This informa- 
tion can have different degrees of ambiguity. In 
particular cases, the behavior of conspecifics di- 
rectly reveals environmental characteristics (for ex- 
ample, food encountered by another individual in- 
forms about the quality of a food patch). Cases 
in which social information correlates well with the 
environmental characteristic of interest have been 
very well studied [29U37j . But in most cases so- 
cial information is ambiguous and potentially mis- 
leading [^nilSHl ■ In spite of this ambiguity, there is 
evidence that in some cases such as predator avoid- 
ance [SnmS and mate choice [H], animals use this 
kind of information. 

Social animals have a continuous flow of infor- 
mation about the environment coming from the be- 
haviours of other animals. It is therefore possible 
that social animals use it at all times, making prob- 
abilistic estimations to counteract its ambiguity. If 
this is the case, estimation of the environment using 
both non-social and social information might be a 
major determinant of the structure of animal col- 
lectives. In order to test this hypothesis, we have 
developed a Bayesian decision-making model that 
includes both personal and social information, that 
naturally weights them according to their reliabil- 
ity in order to get a better estimate of the environ- 
ment. All members of the group can then use these 
improved estimations to make better decisions, and 
collective patterns of decisions then emerge from 
these individuals interacting through their percep- 
tual systems. 



We show that this model derives social rules 
that economically explain detailed experiments of 
decision- making in animal groups I13JI33]- This ap- 
proach should complement the empirical approach 
used in the study of animal groups j42H47j , finding 
which mathematical functions should correspond to 
each experimental problem and to propose exper- 
iments relating estimation and collective motion. 
The Bayesian structure of our model also builds a 
bridge between the field of collective behavior and 
other fields of animal behavior, such as optimal for- 
aging theory [THHH] and others pTll^ . Further, it 
explicitly includes in a natural way different cogni- 
tive abilities, making more direct contact with neu- 
robiology and psychology [3 HT0lfT7] . 



Results 

Estimation model 

We derived a model in which each individual de- 
cides from an estimation of which behavior is best 
to perform. These behaviors can be to go to one 
of several different places, to choose among some 
behaviors like forage, explore or run away, or any 
other set of options. For clarity, here we particular- 
ize to the case of choosing the best of two spatial 
locations, x and y (see Text SI for more than two 
options). 'Best' may correspond to the safest, the 
one with highest food density or most interesting for 
any other reasons. We assume that each decision 
maker uses in the estimation of the best location 
both non-social and social information. Non-social 
information may include sensory information about 
the environment (i.e. shelter properties, potential 
predators, food items), memory of previous experi- 
ences and internal states. Social information con- 
sists of the behaviors performed by other decision- 
makers. Each individual estimates the probability 
that each location, say is the best one, using its 
non-social information (C) and the behavior of the 
other individuals [B), 

PiY\C,B), (1) 

where Y stands for is the best location'. 
P{X\C, B) = l-P(y|C, S), because there are only 
two locations to choose from. We can compute the 
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probability in Eq.[T] using Bayes' theorem, 



PiY\C,B) 



P{B\Y,C)P{Y\C) 



P{B\X, C)P{X\C) + PiB\Y, C)P(Y\C) ' 

By simply dividing numerator and denominator by 
the numerator wc find an interesting structure, 



P{Y\C,B)^ 



1 



1 + aS' 



where 



and 



S 



_ P{X\C) 
^ PiY\C) 

P{B\X, C) 



(3) 



(4) 



(5) 



PiB\Y,C) 

Note that a does not contain any social information 
so it can be understood as the "non-social term" 
of the estimation. We can also understand S as 
the "social term" because it contains all the social 
information, although is also depends on the non- 
social information C. The non-social term a is the 
likelihood ratio for the two options given only the 
non-social information. This kind of likelihood ra- 
tio is the basis of Bayesian decision-making in the 
absence of social information [5J[TTHI1] . Eq. [3] now 
tells us that this well known term interacts with the 
social term S simply through multiplication. 

Wc are seeking a model based on probabilistic 
estimation that can simultaneously give us insight 
into social decision-making and fit experimental 
data. For this reason we simplify the model by as- 
suming that the focal individual does not make use 
of the correlations among the behaviour of others, 
but instead assumes their behaviours to be indepen- 
dent of each other. This is a strong hypothesis but 
allows us to derive simple explicit expressions with 
important insights. The section 'Model including 
dependencies' at the end of Results shows that this 
assumption gives a very good approximation to a 
more complete model that takes into account these 
correlations. 

The assumption of independence translates in 
that the probability of a given set of behaviors is 
just the product of the probabilities of the indi- 
vidual behaviors. We apply it to the probabilities 
needed to compute S in Eq. [5l getting 



N 



PiB\Y,C) = zl[Pib,\Y,C), 



(6) 



where B is the set of all the behaviors of the other 
N animals at the time the focal individual chooses, 
B = {bi}f^i, and bi denotes the behavior of one 
of them, individual i. Z is a combinatorial term 
counting the number of possible decision sequences 
that lead to the set of behaviors B, that will cancel 
out in the next step. Substituting Eq. [6] and the 
corresponding expression for P{B\X, C) into Eq.jSj 
we get 

P{b^\X,C) 



n 



(7) 



Instead of an expression in terms of as many be- 
haviors as individuals, it may be more useful to 
consider a discrete set of behavioral classes. For 
example, in our two-choice example, these behav- 
ioral classes may be 'go to x' (denoted Px), 'go to 
y' {Py) and 'remain undecided' (/?«). Frequently, 
these behavioral classes (or simply 'behaviors') will 
be directly related to the choices, so that each be- 
havior will consist of choosing one option. For ex- 
ample, behaviors Px and Py are directly related to 
choices x and y, respectively. But there may be 
behaviors not related to any option as the case 
of indecision, /3„, or related to choices in an indi- 
rect way. These behaviors can still be informative 
because they may be more consistent with one of 
the options being better than the other (for exam- 
ple, indecision may increase when there is a preda- 
tor, so the presence of undecided individuals may 
bias the decision against the place where the non- 
social information suggests the presence of a preda- 
tor). Let us consider L different behavioral classes, 
{Pk}k=i- We do not here consider individual dif- 
ferences for animals performing the same behavior 
(say, behavior /3i), so they have the same probabili- 
ties P[Pi\X,C) and P[Pi\Y,C). Thus, if for exam- 
ple the ni first individuals are performing behavior 

WP havP that n"i P(>'^\Xfi) „ ( P(f}i\X,C) 

Pi, we nave tiiat {{^^^ p(b,\Y,c) ~ \p[fiW^ 
We can then write Eq. [7] as 



fc=i 



(8) 



where rife is the number of individuals performing 
behavior /3fe, and 



p{Pk\Y,cy 



(9) 
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The term is the probability that an individual 
performs behavior /3k when x is the best option, 
over the probability that it performs the same be- 
havior when y is the best choice. The higher the 
more reliably behavior /3k indicates that x is better 
than y, so we can understand Sk as the reliability 
parameter of behavior /3k. If Sfc = oo, observing be- 
havior /3k indicates with complete certainty that x 
is the best option, while for Sfc = 1 behavior gives 
no information. For < 1, observing behavior 
favors y as the best option, and more so the closer 
it is to 0. Note that P{/3k\X,C) and P{/3k\Y,C) 
are not the actual probabilities of performing be- 
havior /3k-, but estimates of these probabilities that 
the deciding animal uses to assess the reliability of 
the other decision-makers. These estimates may be 
'hard-wired' as a result of evolutionary adaptation, 
but may also be subject to change due to learning. 

To summarize, using Eqs. [3land[8l the probabil- 
ity that y is the best choice, given both social and 
non-social information is 

P{Y\C,B)=(^l + al[4>''^ , (10) 

with a in Eq. [4] and Sk in Eq. [9] 

Decision rule: Probability matching 

We have so far only considered the perceptual stage 
of decision-making, in which the deciding individ- 
ual estimates the probability that each behavior is 
the best one. Now it must decide according to this 
estimation. A simple decision rule would be to go 
to y when P{Y\C,B) is above a certain threshold. 
This rule maximizes the amount of correct choices 
when the probabilities do not change [35], but is 
not consistent with the experimental data consid- 
ered in this paper. Applying this deterministic rule 
strictly, without any noise sources, one would ob- 
tain that all individuals behave exactly in the same 
way when facing the same stimuli, but in the exper- 
iments considered here this is not the case. Instead, 
we used a different decision rule called probability 
matching, that has been experimentally observed 
in many species, from insects to humans [49H55j . 
According to this rule an individual chooses each 
option with a probability that is equal to the prob- 
ability that it is the best choice. Therefore, in our 
case the probability of going to y (Py), is the same 



as the estimated probability that y is the best loca- 
tion {P{Y\C,B)), so 

Py^P{Y\C,B). (11) 

Probability matching does not maximize the 
amount of right choices if we assume that the prob- 
abilities stay always the same, but in many circum- 
stances it can be the optimal behavior, such as when 
there is competition for resources (SHISTj , when the 
estimated probabilities arc expected to change due 
to learning [531155] . or for other reasons [531158] . 

Finally, using Eqs. [TU] and [11] we have that the 
probability that the deciding individual goes to y is 

Py=(^l + af[sl>'^ . (12) 

The assumption of probability matching has the ad- 
vantage that the final expression for the decision in 
Ea. ll2l is identical to the one given by Bayesian esti- 
mation in Eq. llOl with no extra parameters. Alter- 
native decision rules could be noisy versions of the 
threshold rule, but at the price of adding at least 
one extra parameter to describe the noise. Also, de- 
cision rules might not depend on estimation alone, 
but also on other factors or constraints. These more 
complicated rules fall beyond the scope of this pa- 
per. 

In the following sections, we particularize Eq. 1121 
to different experimental settings to test its results 
against existing rich experimental data sets that 
have previously been fitted to different mathemati- 
cal expressions [^H5] . 

Symmetric set-up 

We first considered the simple case of two identi- 
cal equidistant sites, x and y, Fig. [T]A. For a set- 
up made symmetric by experimental design there 
is no true best option. But deciding individuals 
must act, like for any other case, using only their 
incomplete sensory data to make the best possible 
decision. Even when non-social sensory data indi- 
cates no relevant difference between the two sites, 
the social information can bias the estimation of the 
best option to one of the two sites. 

Using Eq. [12] and that the three possible behav- 
iors are 'go to x' {/3x), 'go to y' {/3y) and 'remain 
undecided' {/^u), we obtain 

Py = {l + as:-s;ys:!-----yy\ (13) 
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A Site X 



Site y 



= P(y|social info) = (l + s-'^") ' 
Px = 1 - Py = (l + s*")"' ; An = n, -n, 
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Figure 1. Model with individuals estimating which 
of two identical places is best. (A) Schematic diagram 
of individuals choosing between two identical locations x and 
y when there are already {riy) individuals at x (y). {B) 
Probability of going to j/ as a function of the difference be- 
tween the number of individuals at y and x, Eg. 1171 (C) Se- 
quential application of the behavioural rule in Eq. 1171 with 
s = 2.5, for the simple case of a group of two individuals 
(bottom). The width of the arrows is proportional to the 
probability of each transition. The 3 possible final configu- 
rations, with different proportion of individuals going to y 
(0, 0.5 and 1), have different probabilities of taking place, 
with both fish together at a; or y being more probable than 
a group split (top). 



where and Uy are the number of individuals 
that have already chosen x and y, respectively, and 
iV -t- 1 is the size of the group containing our fo- 
cal individual and other A'^ animals. As the set- 
up is symmetric, the sensory information available 
to the deciding individual is the same for both op- 
tions so PiX\C) = P(Y\C) and then a = 1 ac- 
cording to Eq. |4l Also, since indecision is not re- 
lated to any particular choice, symmetry imposes 
P(/3„|X, C) — P(/3„|y, C), so indecision is not in- 
formative, s„ = 1 (Eq. [9|). For the other two be- 
haviors, going to X {(ix) and going to y {(iy), Eq. [9] 
gives 

_ P{Px\X,C) 



P[(iy\X,C) 



(14) 



P{fix\X, C) and P{Py\Y, C) are the estimated prob- 
abilities of making the right choice, that is, going 
to X when x is the best option, or going to y when 
y is the best option. Since in this case the sensory 



information is identical for both options, the proba- 
bility of making the correct choice must be the same 
for both options, P(/3^|X,C) = P{(3y\Y,C). An 
analogous argument holds for the incorrect choices, 
P(/3,|r,C) = P(/3^ I giving 



l/Sy 



(15) 



In cases in which = 1/sj,, we find it convenient 
to express reliability more generally as 



1/Sy, 



(16) 



which is the ratio of the probability of making the 
correct choice and the probability of making a mis- 
take, for both behaviors. Using this definition and 
given that a = a„ = 1, Eg. 1131 reduces to 



(17) 



with the variable An = Uy — nj-. Eq. 1171 describes 
a sigmoidal function that is steeper the higher the 
higher the value of s (Fig. [T]B). Therefore, for very 
reliable behaviors (high s, meaning individuals that 
are much more likely to make correct choices than 
erroneous ones), Py grows fast with An and the 
deciding individual then goes to y with high prob- 
ability when taking into account the behaviors of 
only very few individuals. 

The behavior of the group is obtained by apply- 
ing the decision rule in Eq. 1171 sequentially to each 
individual (sec Methods). After each behavioural 
choice, we update the number of individuals at x 
and y, using the new n-^ and Uy for the next decid- 
ing individual (Fig. [T]C, bottom). Repeating this 
procedure for all the individuals in the group, we 
can compute the probability for each possible final 
outcome of the experiment (Fig. [T]C, top). 

The relevance of the symmetric case is that the 
model has a single parameter and a single variable, 
enabling a powerful comparison against experimen- 
tal data. We tested the model using an existing 
rich data set of collective decisions in three-spincd 
sticklebacks a shoaling fish species. This data 
set was obtained using a group of iVtot fish choosing 
between two identical rcfugia, one on their left and 
another one on their right (Fig. [2]A), equivalent to 
locations x and y in the model (Fig. [IJ4). At the 
start of the experiment, nix (™y) replica fish made 
of resin were moved along lines on the left (right) 
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towards the refugia (Fig.[3]A). The experimental re- 
suhs consisted on the statistics of coUcctivc deci- 
sions between the two refugia for 19 different cases 
using different group sizes A^tot = 2, 4 or 8 and 
different numbers of rephcas going left and right, 
m,j::my = {1:1, 2:2, 0:1, 1:2, 0:2, 1:3, 0:3} (FigW, 
blue histograms). To compare against these exper- 
imental data, we calculated the probability of find- 
ing a collective pattern applying the individual be- 
havioural rule in Eq. 1171 iteratively over each fish 
for the 19 experimental settings. We found a good 
fit of the model to the experimental data using for 
the 19 graphs the same value s = 2.2 (Fig. red 
line). The model is robust, with good fits in the 
interval s — 2-4 (Fig. |3l red line). 

Despite the simplicity of the behavioral rule in 
Eq. 1171 it reproduces the experimental results, in- 
cluding the dependence on the total number of fish 
-^tot, even though the rule is independent of this 
parameter, except for determining the range of pos- 
sible values of An. The dependence of the final dis- 
tributions on A'tot emerges from the application of 
the rule to the iVtot individuals in the group, as is 
illustrated in Fig. 21 Each small box represents a 
state of the system in which : riy fish have al- 
ready decided to go to x and y, respectively. The 
lines connecting each box with another two boxes 
on top represent the decision made by the next de- 
ciding individual, that takes the system to the next 
state. The width of the lines is proportional to the 
probability of the decision. As more individuals de- 
cide, the central states become less likely simply 
because they accumulate more unlikely decisions. 
Therefore, the U-shape or J-shapc becomes more 
pronounced for larger groups, even though the in- 
dividual decision rule in Eq. 1171 is independent of 
the total number of individuals iVtot . 

Group decision-making in three-spined stickle- 
backs shows a single type of distribution in which 
probability is minimum at the center and increases 
monotonically towards the edges, denoted here as 
U-shaped distribution (or J-shapcd when there is 
a bias to one of the two options). However, the 
model in Eq. 1171 also gives two other types of dis- 
tributions, Fig.[5]/1. For non-social behavior (s — 1) 
the histogram is bell-shaped due to combinatorial 
effects. However, a bell-shape is also compati- 
ble with social animals for a certain range of s 
and group size (white region on the bottom-left of 
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Figure 2. Comparison between model and stickle- 
back choices in symmetric set-up. (A) Schematic dia- 
gram of symmetric set-up with a group of sticklebacks (in 
black) choosing between two identical refugia and with dif- 
ferent numbers of replica fish (in red) going to x and y. (B) 
Experimentally measured statistics of final configurations of 
fish choices from 20 experimental repetitions 1421 (blue his- 
togram) and results from the model in Eq. 1171 in the main 
text (red line using reliability parameter s = 2.2; red region: 
95% confidence interval; green line with s = 2.5). Different 
graphs correspond to different stickleback group sizes and 
different number of replicas going to x and y. 



Fig- EH)- For higher values of s, the histograms are 
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Symmetric (Fig. 2) 

Manipulated replicas (Fig. 6) 

Predator (Fig. 7) 

Global 




^ Proportion going to y g Proportion going to y 



% 0.5 



0.5 



Figure 3. Goodness of fit for different values of tlie 
reliability (s). Red: Symmetric case (plots in Fig. [2]l. 
Green: Case with different replicas at each side (plots in 
Fig- El The ratios Sr/sR are re-optimized for each value of 
s). Blue: Asymmetric set-up with predator on one side 
(plots in Fig. [T] Parameter a is re-optimized for each value 
of s). (j4) Root mean squared error between the data and the 
probabilities predicted by the model. Grey dashed line shows 
the mean RMSE for the three cases. The absolute values 
for each case depend on the shape of the data and arc not 
comparable, only the trends and the position of the minima 
should be compared. (B) Logarithm of the probability that 
the data come from the model. The height of each curve 
depends on the number of data for each experiment, only the 
trend and the position of the maxima should be compared. 
Grey dashed line shows the sum of the three coloured lines, 
but shifted by 1000 so that it fits on the scale. The peak of 
this global probability indicates the value of s that best fits 
the three datascts (s = 2.5). 



M-shaped, with two maxima located between the 
center and the sides (region coloured in black and 
blue in Fig. [5l4). However, the M shape becomes 
clear only with enough number of bins because the 
drop in probability near the edge or at the center 
of the distribution disappears when binning is too 
coarse, producing a bell-shaped or U-shaped his- 
togram. Fig. [S]S. This is an important practical is- 
sue, because the amount of data that can be col- 
lected rarely allows for more than 5 bins. The col- 
orscale in Fig.[S]A reflects the number of bins needed 
to observe the M shape (black has been reserved for 
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Figure 4. Illustration of the decision-making process 
in the model. Bottom: Decision-making process accord- 
ing to Eg. llTl l'with s = 2.5). Time runs from bottom to top. 
Each box represents a state with a given number of fish hav- 
ing already decided x or y {rix '■ ny). Each state can lead to 
another two states in the following time step, depending on 
whether the focal fish decides to go to x or y. The width of 
the lines connecting states is proportional to the probability 
of that transition (equal to the probability of the prior state 
times the probability of the focal fish making the decision 
that leads to the later one). Top: Probability of each state 
after 8 fish have made their decisions. (^4) Case with no 
replicas, in which the final outcome is U-shaped. (B) Case 
with one replica going to y (so initial state is already 0:1), in 
which the final outcome is J-shaped. 



exactly 5 bins). For high values of s, the histograms 
are U-shaped (white region on the top of Fig. [5]A)- 
Also, all the M-rcgion above the black zone becomes 
of type U when the binning is too coarse. 

An interesting prediction of our model is that, for 
a given number of bins, the shape of the distribu- 
tion of choices changes with the number of decided 
individuals, and the dynamics of this change de- 
pends on s. For high values of s, the probability is 
U-shaped from the beginning and becomes steeper 
as more individuals decide (as is the case for the 
stickleback dataset), Fig. [S]C. For lower values of 
s, we observe M-shaped distributions for the first 
individuals and then U-shaped ones when more in- 
dividuals decide. Fig. [5]D. For even lower values of 
s, we observe bell-shaped distributions for the first 
individuals, then M-shaped and finally U-shaped, 
Fig.\5\E,F. 
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Figure 5. Types of distributions and dynamics for 
different values of reliability parameter s and group 
size. {A) Shape of histogram of final configurations as a 
function of s and the group size. Bell-shaped: white region 
on the bottom-left. M-shaped: region coloured in black and 
blue. As the observation of the M shape depends on the 
number of bins, the colorscale reflects the number of bins 
needed to observe the M shape (black has been reserved for 
exactly 5 bins). U-shape: white region on the top. Also, 
all the M-region above the black zone becomes U when the 
binning is too coarse. There is also a small region below the 
black zone where the M shape becomes a bell shape when 
the binning is too coarse. {B) Dependence of the apparent 
shape on the number of bins: Top, 80 bins. Middle, 10 bins. 
Bottom, 5 bins. On the left, a probability that seems U- 
shapcd for 5 bins, but is M shaped for a higher number of 
bins. On the right, a probability that stays M-shaped for any 
number of bins. (C-F) Dynamics of the probability as the 
number of individuals increases for (C) s = 2, (D) s = 1.62, 
(E) s = 1.35 and (F) s = 1.05. 

Symmetric set-up with modified replicas 
of animals 

An interesting modification of the experimental set- 
up consists in using replicas of the animals that we 
can modify to potentially alter their reliability es- 
timated by the animals. We considered the par- 
ticular case, motivated by experiments in |43] . of 
two types of modified replicas with different char- 



acteristics (for example, fat or thin), Fig. We 
considered 7 behaviors: 'animal goes to x' (/?&), 
'animal goes to y' (/3fj,), 'most attractive replica 
goes to x' (/3rx), 'most attractive replica goes to 
y' (/3ri/) 'least attractive replica goes to x' 
'least attractive replica goes to y' {P^y), and 'ani- 
mal remains undecided' (/?fu). The probability of 
going to y in Eq. 1121 then reduces to 

Py = 

^^ -\- n e"f=" e"''!' q^^" s""^" e"'^ s'^'V o-'^f ""f* ^ ^ 

(^i-l-aAf^ bfy i,^^ b^y J , 

(18) 

where subindex 'f refers to real fish and 'R' ('r') to 
replicas of the most (least) attractive type. As in 
the previous section, symmetry imposes that a = 1 
and Sfi, = 1. It also imposes the following relations 
between the reliability parameters, Sf = S[x = 1/sfy, 
sr = snx = 1/sRy, Sr = Srx = 1/sry Therefore, 

Py={l + ^-^"'^-'^"-sr^"') , (19) 

where Anf = rify — rife, Ajir = tirj, — n^^ and 
Arij. = Tij-y — tIt-x ■ In the particular case of only two 
different replicas, one going to x and the other to y 
and for notational simplicity taking the convention 
that the most (least) attractive replica goes to y 
(x), we have Atir = 1 and An,- = — 1. Therefore, 

= (i + ^^r'^"') (20) 

Note that the probability in Eq. 1201 does not de- 
pend on Sf and sr separately, but only on their ra- 
tio. Therefore, in this case the model uses only two 
parameters (sf and Sr/sR,). We compared the model 
with the stickleback data set from [33], Fig.lHJ The 
data in Fig. has a different type of replica pair in 
each row, so in principle we would fit a different ra- 
tio Sr/sR for each row. But note that the first three 
rows correspond to experiments with the same three 
replicas (large, medium and small), combined in dif- 
ferent pairs. The same can be said for the second 
and third threesomes of rows. Therefore, there are 
only two free parameters for each three rows. On 
the other hand, Sf should have the same value for all 
cases. The model again reproduces the experimen- 
tal results reported in reference [33] , obtaining the 
best fit for Sf = 2.9 (Fig.|n]B). The resuh is robust, 
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with good fits for S{ = 2-4 (Fig. |31 green line) in 
accord with the value obtained for the case shown 
in Fig. [2]B. 

Asymmetric set-up 

We finally considered the case in which sites x and 
y are different and the three behaviors are 'go to 
x' {(3x), 'go to y' (/3.y) and 'remain undecided' {f3u)- 
Eq. 1121 reduces to 



1 



IX Cy 



(21) 



The term a = P{X\C)/ P{Y\C) represents the non- 
social information and in general a ^ 1 because the 
set-up is asymmetric by design. This asymmetry 
might also affect how a deciding animal takes into 
account the behaviours of other animals depend- 
ing on which side they chose, making in general 
Sx 7^ Also, indecision might be informative. 

For example, if non-social information indicates the 
possible presence of a predator at y, the indecision 
of other animals might confirm this to the deciding 
individual, further biasing the decision towards x. 
Therefore, we may have s„ 7^ 1. But it may also 
be the case that the set-up's asymmetry does not 
affect the social terms, so we also tested a simpler 
model in which s = = ^/sy and s„ = 1, giving 



-Ar, 



(22) 



The stickleback dataset reported in reference [42] 
is ideally suited to test the asymmetric model for 
the experiments that were performed with a replica 
predator at the right arm (Fig. [7H)- The model in 
Eq.[22]fits best the data with s = 2.6 (Fig.[7]B) and 
it is robust with a good fit in s = 2-4 (Fig. |3l blue 
line) . The more complex model in Eq. 1211 gives fits 
very similar to those of simpler model. Specifically, 
parameter s„ was rejected by the Bayes Information 
Criterion [SniEO], suggesting that fish do not rely 
on undecided individuals. The fact that fish rely 
differently on other fish depending on the option 
they have taken could not be ruled out by the Bayes 
Information Criterion, but in any case the impact 
of this difference on the data is small. 

In the experiments in Fig. [5] and Fig. [71 we have 
assumed that the replicas are perceived by fish as 
real animals. However, it is reasonable to think 
that fish might perceive the difference, and rely dif- 
ferently on replicas and real fish. To test this, we 
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Figure 6. Comparison between model and stickle- 
back choices with two differently modified replicas. 

{A) Schematic diagram of symmetric set-up with a group of 
sticklebacks (in black) choosing between two identical refugia 
and with one replica fish going to x and a different one (in 
size, shape or pattern) going to y (in red). {B) Experimen- 
tally measured statistics of final configurations of fish choices 
from 20 experimental repetitions 1431 (blue histogram) and 
results from model in Ea. l20l in the main text (red line using 
reliability parameter Sf = 2.9 and Sr/sR = 0.35, 0.7, 0.5, 
0.52, 0.69, 0.75, 0.43, 0.55, 0.78, 0.43, for each row from top 
to bottom; red region: 95% confidence interval; green line 
with Sf = 2.5 and same ratios Sr/sR as for red line). Dif- 
ferent graphs correspond to different stickleback group sizes 
and different types of replicas going to x and y. 
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Figure 7. Comparison between model and stickle- 
back choices in asymmetric set-up. {A) Schematic dia- 
gram of asymmetric set-up (predator at y, large fish depicted 
in red) with a group of sticklebacks (in black) choosing be- 
tween two refugia, and replica fish (small fish depicted in red) 
going to y. (B) Experimentally measured statistics of final 
configurations of fish choices from 20 experimental repeti- 
tions 1421 (blue histogram) and results from model in Eq. 1221 
in the main text (red line using s = 2.6, a = 9.5; red region: 
95% confidence interval. Green line using s = 2.5 and same 
a as for red line). Different graphs correspond to different 
stickleback group sizes and different number of replicas going 
to y. 



considered different behaviors for fish and rephcas, 
such as 'fish goes to x' and 'rephca goes to x\ Mak- 
ing that distinction, we get that Eq. [12] reduces to 



P. 



1 



"'"fx "iy ''rx "ry "iu 



(23) 



The Bayes Information Criterion rejects only pa- 
rameter Sfu- However, the addition of the new pa- 
rameters that distinguish rephca from real fish give 
very small improvements in the fits compared to re- 
sults of the simpler models in Ea. ll7l and Ea. l22l fsee 
Fig. [ST] and [S3)) , suggesting that fish follow replicas 



as much as they follow real fish. 



Model including dependencies 

In this section we will remove the hypothesis of in- 
dependence among the behaviors of the other in- 
dividuals (Eq. [6]). We now consider that the focal 
individual not only takes into account the behaviors 
of the other animals at the time of decision but the 
specific sequence of decisions that has taken place 
before, {bi}^^, being K — 1 the number of indi- 
viduals that have decided before the focal one. For 
example, the sequence {x, y} may give different in- 
formation to the focal individual than the sequence 
{y,x}. This is illustrated in Fig. [5]A, where there 
are two possible paths leading to states labeled as 
1:1, but these two states are in different branches 
of the tree (in contrast with Fig. [U in which these 
two states were collapsed in a single one). 
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Figure 8. Model taking into account dependencies. 

{A) Decision-making process according to the model with 
dependencies, Eq. I25I33I Time runs from bottom to top. 
Each box represents one state, and each edge represents one 
option of the deciding individual, that either goes to x or 
to y. Edge width is proportional to the probability of the 
decision. [B) Probability of choosing j; as a function of the 
difference of the number of individuals that have already 
chosen each option (An = Uy — n^), for = 5. In the 
new model the probability does not depend any more on An 
alone, so states with the same An have different values for the 
probability (black dots). The area of the dots is proportional 
to the probability of observing each state. Red line shows the 
expected value of the probability for each value of An. The 
green line shows the probability for the model that neglects 
dependencies CEo. I17II . (l + s~^")~^ for s = 2.5. 



To calculate the probability of the observed se- 
quence of behaviors provided that Y is the correct 
choice, 

P{{h}l-,'\Y,CK). (24) 
one can apply P(A, B) = P{A\B)P{B) repeatedly 
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to obtain 



K-l 



Pi{h}t-^'\Y,CK)= n Pih\Y,CK,{b^} 



k=l 



(25) 

This expression substitutes the assumption of inde- 
pendence in Eq. [6] Each of the terms in the prod- 
uct is simply the probabihty that the fc-th individ- 
ual makes its decision, given the previous decisions, 
and also given that y is the correct choice. This 
result was expected since if we look at the tree in 
Fig. |8]A we see that the probability of reaching a 
given state is simply the product of the probabili- 
ties of choosing the adequate branches in each step. 

So the problem reduces to computing the individ- 
ual decision probabilities P{bk\Y,CK, {bi}iZi)- We 
assume in the following that these probabilities are 
calculated by the focal individual by assuming that 
all animals use the same rules to make a decision. 
The rule for the focal individual is, as in previous 
sections, 



1 



1 -I- ax Sk ' 
where the non-social and social terms are 
P{X\Ck) 



P(Y\Ck)' 



and 



pmi-,'\x,CK) 



s 



K 



(26) 



(27) 



(28) 



p{{h}t\'\Y,CKy 

respectively, and where we have added subscript K 
to 5", a and C to reflect that they apply to the 
focal individual, that makes its decision in the K-th 
place. 

The assumption that all animals apply the same 
rules translates into the following. To apply an 
equation like Eq. 1261 but on a different individual 
(say, individual k) it is necessary to know the non- 
social information Cfc. Remember that all these 
computations are made from the point of view of the 
focal individual, and obviously the focal individual 
does not have access to the non-social information 
of the other individuals. It may seem reasonable for 
the focal animal to assume that all the other indi- 
viduals have the same non-social information (Ck), 
but this would result in no social behavior at all (if 
the other individuals have the same non-social in- 
formation, their behaviors will not give any extra in- 
formation). Instead, one can assume that the other 



individuals may have a different non-social informa- 
tion, C . Furthermore, this non-social information 
depends on which is the best choice, because if for 
example x is the best choice the other individuals 
have some probability of detecting it, and therefore 
their non-social information will be on average bi- 
ased towards x. We approximate this average bias 
by assuming that, if x (y) is the best choice, all 
the other individuals will have non-social informa- 
tion C'x (Cy) that will bias the decision towards x 
(y). It is therefore the same to assume that x (y) 
is the best option as to assume that all the other 
individuals have non-social information C'x (C'y)- 
Therefore, for the probabilities of individual behav- 
iors in Eq. 1251 we have that 



P{bk\Y,CK,m-=i) = P{bk\C{■,CK,{b^}■=l), 

(29) 

where now Cy applies to the fc-th individual, so we 
can compute this probability simply by applying 
Eq. [26] to the fc-th individual. 



where 



1 

^^^■^ = 1 + a'ySk' 

, _ P{X\C!y) 



(30) 



(31) 



Then, if we denote P,„y = P{bk\G'y,CK, {6.}^/). 
we have that 



Pb^Y = Pyk,Y 
Pb,.Y^l^Py,.Y 



if bu 
if bk 



y 

X. 



(32) 



These are the individual probabilities needed in 
Eq. 1251 that takes into account the correlations 
among the other individuals. So we can already 
calculate Sk using Eq. 1281 



Sk 



n 



fe-i 



bi,X 



n 



k-i p 

1=1 Pb.,Y 



(33) 



Eqs. 1301 and 1331 have a recursive relation, because 
we need the probabilities up to step fc— 1 to compute 
Sk, and then we need Sk to compute the probabil- 
ities in step fc. At the beginning no individual has 
made any choices, so we start with 5i = 1 and work 
recursively from there until we obtain the probabil- 
ities for individual K ~ 1, that allow to compute 
5*/^. Then, we can already use Eq. 1261 to compute 
the decision probability of the focal individual, this 



11 



time using its actual non-social term uk (which is 
1 for the symmetric cases, and fitted to the data in 
the non-symmetric case). 

The equations above constitute the model taking 
into account dependencies. The new parameters of 
this model are a'^ and Oy, which substitute Sx and 
Sy in the previous models, so the number of param- 
eters is exactly the same. In the symmetrical case 
we must have that = l/a'y, so the model has 
a single parameter. For the non-symmetrical case 
these parameters may be independent of each other, 
but we find good results even assuming that they 
are not, as was the case for the simplified model. 
So for simplicity we always assume that 



''X 



= l/a'y 



(34) 



For the case with different replicas at each side, each 
of them has a different value of a'^ , thus making one 
replica more attractive than the other. 

The new model also matches very well with the 
experimental data discussed in this paper. Results 
for the case of two different replicas are shown in 
Fig. El for the symmetric case in Fig.lS4land for the 
case with predator in Fig. IS5I Fits are robust, and 
all cases are well explained by the model with the 
same value of a'^ = 5, Fig. [S6l See Figs. [ST]IS3] for 



''X 

a comparison of all models. 

We now ask how different is the model including 
dependencies from the model that neglects them. 
To compare the two models, we plot the probabil- 
ity of going to y as a function of An = Uy — Ux 
for the new model, as we did in Fig. [T]B for the 
old one. The inclusion of dependencies has the con- 
sequence that the probability of going to y does 
not depend only on An, since now different states 
with the same An may have different probabilities. 
Therefore, when we plot the probability of going 
to y as a function of An we obtain different val- 
ues of the probability for each value of An. This 
is shown by the black dots in Fig. [8]S, where the 
size of the dots is proportional to the probability of 
observing each state when starting from 0:0. The 
red line shows the average probability for each An, 
taking into account the probability of each state. 
Both the dots and this line correspond to a'x = 5, 
which is the one that fits best the data. The green 
line corresponds to the probability for the simplest 
model neglecting dependencies, with the value that 
best fits to the data (s = 2.5). This line is close 
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Figure 9. Comparison between model including de- 
pendencies and stickleback choices with two differ- 
ently modified replicas. [A) Schematic diagram of sym- 
metric set-up with a group of sticklebacks (in black) choosing 
between two identical refugia and with one replica fish going 
to X and a different one (in size, shape or pattern) going to 
y (in red). {B) Experimentally measured statistics of final 
configurations of fish choices from 20 experimental repeti- 
tions 1431 (blue histogram) and results from model that takes 
dependencies into account (red line, with a'^ = 4.8 and 
a'x .o^K.., = 21.4. 11.8, 0.6, 9.9, 4.8. 0.9, 13. 8, 0.7, 14.5, 0.9, 
for each type of replica (large, medium, small, fat, etc.); red 
region: 95% confidence interval; green line with a'-^ = 5 
and same a'^ replicas ^'^^ ^'^'^ line). Different graphs corre- 
spond to different stickleback group sizes and different types 
of replicas going to x and y. 



to the mean probability for the new model and to 
the values with highest probability of occurrence, 
so the simple model is as a good approximation to 
the model with dependencies. 

We find an interesting prediction of the new 
model: There are some states in which the most 
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likely option is to choose the option chosen by fewer 
individuals (for example, note in Fig.[5p that some 
points with An < arc above 0.5). This surpris- 
ing result comes from the fact that, as more fish 
accumulate at one side, their choices become less 
and less informative (because it is very likely that 
they are simply following the others). If then one 
fish goes to the opposite side, its behavior is very 
informative, because it is contradicting its social in- 
formation. This effect can be so strong that it may 
beat the effect of all the other individuals, resulting 
in a higher probability of following this last individ- 
ual than all the individuals that decided before. 

Discussion 

We have shown that probabilistic estimation in the 
presence of uncertainty can explain collective ani- 
mal decisions. This approach generated a new ex- 
pression for each experimental manipulation, Eq. 
I17II22[ and was naturally extended to test for more 
refined cognitive capacities. Eg. 1231 The model was 
found to have a good correspondence with the data 
in three experimental settings (Figs. [2j |6] and [7]) , 
always giving a good fit with the social reliability 
parameter s in the interval 2-4. Indeed, all the data 
have a very good fit with s = 2.5 (Figs. [21 M and 
[71 green lines). According to Eq. [9l this value for 
s has the interpretation that, for the behaviors rel- 
evant for these experiments, the fish assume that 
their conspecifics make the right choice 2.5 times 
more often than the wrong choice. 

For the data used in this paper, previous empir- 
ical fits used more parameters [12] (Figs. IS1IIS3[ 
blue line) , and added more complex behavioral rules 
when the basic model failed [13] fFig. lS2l blue line). 
Our approach thus gains in simplicity. It also finds 
an expression for each set-up with expressions for 
complex set-ups obtained with add-ons to those of 
simpler set-ups, making the model scalable and eas- 
ier to understand in terms of simpler experiments. 
Also, taking the models as fits to experimental data, 
the baycsian information criterion finds our models 
to be better than those in [12] and [13] (see captions 
in Figs. [ST][S3]for details). 

Collective animal behavior has been subject to a 
particularly careful quantitative analysis. Previous 
studies have given descriptions led by the powerful 
idea that complex collective behaviors can emerge 



from simple individual rules. In fact, some systems 
have been found empirically to obey rules that are 
mathematically similar or the same as some of the 
ones presented in this paper, further supporting the 
idea that probabilistic estimation might underlie 
collective decision rules in many species. For ex- 
ample, a function like the one in Eq. 1171 has been 
used to describe the behavior of Pharaoh's ant [BT] . 
a function like Eq. 1221 for mosquito fish [S^ , and a 
function like the one in the right-hand-side of Eg. [22] 
for mecrkats |63j . But despite the importance of 
group decisions in animals, little is known about the 
origin of such simple individual rules. This paper 
argues that probabilistic estimation can be an un- 
derlying substrate for the rules explaining collective 
decisions, thus helping in their evolutionary expla- 
nation. Also, this connection between patterns in 
animal collectives and a cognitive process helps to 
explain the similarities that exist between decision- 
making processes at the level of the brain and at 
the level of animal collectives [64ll65) . 

Our model is naturally compatible with other 
theories that use a Baycsian formalism to study dif- 
ferent aspects of behavior and neurobiology, thus 
contributing to a unified approach of information 
processing in animals. For example, it may be com- 
bined with the formalism of Baycsian foraging the- 
ory |18| , through an expansion of the non-social re- 
liability a. Related to this case, a very well studied 
example of use of social information is the one in 
which one individual can observe directly the food 
collected by another individual |2M55] . In this case 
the social information is as unambiguous as the non- 
social one, so in this case both types of information 
should have a similar mathematical form [2&H33] . 
This is consistent with our model, that in this case 
will give a similar expression for a and S. Other 
kinds of social information (such as another indi- 
vidual's decision to leave a food patch or choices of 
females in mating |41[ ) would enter naturally in our 
reliability terms Sk- In discussing these and similar 
problems, it has been proposed that animals should 
use social information when their personal informa- 
tion is poor, and ignore it otherwise [^511261141] . Our 
model provides a quantitative framework for this 
problem, predicting that social information is al- 
ways used, only with different weights with respect 
to other sources of information. Bayesian estima- 
tion is also a prominent approach to study decisions 
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in neurobiology and psychology [5Hl7j and it would 
be of interest to explore the mechanisms and role 
played by the multiplicative relation between non- 
social and social terms. 

Our approach also makes a number of predic- 
tions. For example, it derives the probability of 
choosing among M options (see Eq. S16 of the 
Text SI), that for the symmetric case reduces to 

P^= [l+ ^ J , (35) 

predicted also to fit the data for cases with AI > 2 
options. 

We also predict a quantitative link between es- 
timation and collective behavior. The parameters 
a and Sk in our model are in fact not merely fit- 
ting parameters, but true experimental variables. 
Manipulations of a and Sk should allow to test that 
changes in collective behavior follow the predictions 
of the model. A counterintuitive prediction about 
the manipulation of Sk is that external factors unre- 
lated to the social component can nevertheless mod- 
ify it. For example, a fish that usually finds food 
in a given environment should interpret a sudden 
turn of one of his mates as an indication that it 
has found food, and therefore will follow it. In con- 
trast, another fish that is not expected to find food 
in that environment will not interpret the sudden 
turn as indicative of food, and will not follow. Thus, 
the model predicts that the a priori probability of 
finding food (to which each fish can be trained in 
isolation) will modify its propensity to follow con- 
specifics. An alternative approach that would not 
need manipulation of the reliabilities Sk would con- 
sist in showing that the probability of copying a 
behavior increases with how reliably the behavior 
informs about the environment. 

We can also extend the estimation model to use, 
instead of the location of animals, their predicted 
location. We would then find expressions like the 
ones in this paper but for the number or density of 
individuals estimated for a later time. Consider for 
example the case without non-social information, 
described in Eq. [TT] for two options and in Eq. 1351 
for more options. We can rewrite these equations 
as P,, = ris"'' with II one of the options and 
is the normalization, 17 = X]m=i ; where AI 
is the number of options. Then, we would have 



P{x) = J7sP(^;*+^*) for the continuous case using 
prediction. Future positions at times t + At (where 
At docs not need to be constant) in terms of vari- 
ables at present time t would be given hy x + vAt 
for animals moving at constant velocity v. Consider 
then a simple case of an animal located at x and es- 
timating the future position of a compact group at 
Xg and moving with velocity Vg. The deciding ani- 
mal would be predicted to move with a high proba- 
bility in the direction {xg{t) — x{t)) + Atvg{t). Es- 
timation of future locations thus naturally predicts 
in this simple case a particular form of 'attraction' 
and 'alignment' forces of dynamical empirical mod- 
els PSIISS] as attraction to future positions, but in 
the general also deviations from these simple rules. 

Methods 

Obtaining group behavior from the 
model of an individual 

The estimation rules presented in this paper refer 
to a single individual. To simulate the behavior 
of a group, we use the following algorithm: The 
current individual decides between x and y. After 
the decision, we recompute the relevant parameters 
of the model and use the new values for the next 
deciding individual. The undecided individuals are 
only those that are waiting for their turn to decide. 
We tested an alternative implementation in which 
individuals may remain undecided or in which two 
individuals can decide simultaneously, obtaining no 
relevant differences. 

For the case of the model including dependencies, 
the model always starts at state 0:0, with 5=1. 
Most experiments have initial conditions in which 
several replicas are already going to either side, and 
the fish have no information about the path followed 
to reach this state. In these cases, we average the 
probabilities of all the paths that might have pos- 
sibly led to the initial state to compute the initial 
value of S. 

Protocol SI and Protocol S2, contain Matlab 
functions that run the models (extensions of the 
files must be changed from .txt to .m to make them 
operative). Protocol SI corresponds to the model 
without dependencies, and Protocol S2 corresponds 
to the model with dependencies. These functions 
have been used to generate all the theoretical re- 
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suits presented in this paper. 

Fits 

Wc computed log likelihood as the logarithm of 
the probability that the histograms come from the 
model. We searched for the model parameters giv- 
ing a higher value of log likelihood, corresponding 
to a better fit. This search was performed by opti- 
mizing each parameter separately (keeping the rest 
constant) and iterating through all parameters until 
convergence. In all cases convergence was rapidly 
achieved. We performed multiple searches for best 
fitting parameters starting from random initial con- 
ditions and always found convergence to the same 
values, suggesting there are no local maxima. In- 
deed, we observed that log-likelihood is smooth and 
with a single maximum in all the cases with 1 or 2 
parameters (see Fig. |2]for an example). 

Bayesian Information Criterion 

For model comparison we used the Bayesian Infor- 
mation Criterion (BIC) [59l[60], which takes into 
account both goodness of fit and the number of pa- 
rameters. According to this criterion, among sev- 
eral models that have been fitted to maximize log 
likelihood, one should select the one for which 



BIC,^L,-h,\og{h) 



(36) 



is largest, where Li is the logarithm of the probabil- 
ity that the data comes from the i-th model once its 
parameters have been optimized to maximize this 
probability, ki is its number of parameters of the 
i-th model and h is the number of measurements 
(which in our case is the same for all models). 

More intuitive than the direct BICi values in 
Eq. [36]are the BIC weights, defined as [60] 



of refs. [12113^ were originally fitted by minimizing 
the mean squared error instead of by maximizing 
logprob. For this reason, they score very poorly in 
BIC with their reported parameters. For this rea- 
son, we re-optimized for maximum logprob all their 
model parameters (these parameters are, using the 
notation of refs. [15J|33], a, fc, T, m and r, with r 
only applicable in the case of predator present). For 
the case of different replicas going to each side, pa- 
rameter pbias takes a different value for each row 
in the figure, adding up to 10 parameters. The 
model in ref. |43j is computationally expensive, so 
it is not feasible to re-optimize these many param- 
eters. Therefore, we treated them as if they were 
independently measured: we fixed pbias in each case 
so that the results of the trials with a single indi- 
vidual matched exactly the model's prediction (as 
reported in [15]). We also followed this procedure 
with the ratios Sr/sR of our model without depen- 
dencies, and the pairs a'x ropUcas o^^' model with 
dependencies. Then, we performed BIC taking into 
account neither these parameters (pbias the ratios 
Sr/sR and the pairs a'-^ rcpUcas) '^'^^ data from 
trials using single individuals. 
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Wi 



expjBICi) 



(37) 



when we assume that all models are a priori equally 
likely. Roughly speaking, Wi can be interpreted as 
the probability that model i is the most correct one 

m- 

We used BIC to compare different versions of our 
model, and also to compare our model with those 
of references [31133 (see Figs. [SI]|S3]) . The models 
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Supporting Figures 



Behaviour of each individual: 
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Figure SI. Comparison between different models for the symmetric set-up. Experimentally measured statistics 
of final configurations of fish choices from 20 experimental repetitions 1421 (blue histograms). Red line: results from our 
single-parameter model assuming independence in Eg. 1171 in the main text (s = 2.2). Green line: Enhanced model assuming 
independence with different reliability for the replicas (sf = 3, Sr = 1.76). Yellow line: Model including dependencies 
(a^ = 4.9). Blue line: Empirical model presented in Ref. 1421 . using the parameters reported there. Different graphs 
correspond to different stickleback group sizes and different number of replicas going to x and y. According to Bayesian 
Information Criterion (BIC, see Methods), the best model is our model with dependencies (yellow line, logprob L = —394, 
and BIC weight w = 0.996. Second-best is the complicated version of the model without dependencies (green line, logprob 
L = —396, and BIC weight w = 0.004). Third-best is our one-parameter model assuming independence (red line, L = —419, 
w = 3 - 10~^^). And last (but not far from the third one) the model from Ref. [42] (blue line, L = —411 w = 5 - 10~^^). For 
the model from Ref. 1421 . L and w correspond to a re-optimization of the model as described in Methods, because using the 
parameters reported in | 42 | would perform worse). 
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Figure S2. Comparison between different models for the condition with two different replicas. Experimentally 
measured statistics of final configurations of fish choices from 20 experimental repetitions 1431 (blue histograms). Red line: 
results from model in Eq. [20]in the main text (sf = 2.9, Sr/sR = 0.35, 0.7, 0.5, 0.52, 0.69, 0.75, 0.43, 0.55, 0.78, 0.43 for 
each row from top to bottom). Yellow line: Model including dependencies (a^ = 4.8, "^x replicas ~ 21.4, 11.8, 0.6, 9.9, 
4.8, 0.9, 13, 8, 0.7, 14.5, 0.9 for each type of replica (large, medium, small, etc.). Blue line: Empirical model presented in 
Ref. 1431 . using the parameters reported there. Different graphs correspond to different sticUeback group sizes and different 
types of replicas going to x and y. According to Bayesian Information Criterion (BIC, see Methods), our model neglecting 
dependencies gives the best representation of the data (red line, logprob L = —783, and BIC weight w = 0.9985). Second- 
best is out model including dependencies, (L = —788, w = 0.001). Last, but near the second one, is the model from ref. 1431 
(blue line, L = —781 w = 0.0005. For the model from Ref. 1431 . these values of L and w correspond to a re-optimization 
of the model as described in Methods, because using the parameters reported in 1431 would perform worse). The values of 
logprob (L) reported here do not include the data of the single-individual experiments (see Methods). 
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Figure S3. Comparison between different models in the asymmetrical set-up. Experimentally measured statistics 
of final configurations of fish choices from 20 experimental repetitions 1421 (blue histograms). Red line: results from model 
neglecting dependencies in Ea. l22l in the main text (s = 2.6, a = 9.5). Green line: Enhanced model neglecting dependencies 
with different reliability for the fish going to different locations and for the replicas (a = 5.5, Sfj. = 50, Sfy = 2/3, s^y = 0.36. 
Sri, has no effect because there are no replicas going to x). Yellow line: Two- parameter model including dependencies 
(a = 9.94, a'-^ = 8.66). Blue line: Empirical model presented in Ref. I42| . using the parameters reported there. Different 
graphs correspond to different stickleback group sizes and different number of replicas going to y. According to Bayesian 
Information Criterion (BIC, see Methods), the best two models are our complicated version neglecting dependencies (green 
line, logprob L = —225, and BIC weight w = 0.52) and our two-parameter model including dependencies (yellow line, 
L = —231, w = 0.38). Next (but very near) is our simplified model (red line, L = —232, w = 0.098). And last (and 
significantly worse) the model from Ref. 1421 (blue line, L = —234 w = 2.5 ■ 10~®. For the model from Ref. 1421 . the values 
of L and w correspond to a re-optimization of the model as described in Methods, because using the parameters reported 
in 1421 would perform worse. In two of the graphs for group size 1 that there are no data the prediction of the model from 
Ref. 1421 and our model (especially the simplest version) arc opposite. It might be that the results changed completely, 
depending on the results of these graphs, were the experiments performed. But we found that this is not the case: We 
performed simulations, adding experimental data in these two graphs. Even in the extreme case that the fabricated results 
matched exactly the predictions of the model in Ref. 1421 . BIC would still favour two of our models (we would get L = —254, 
w = 0.99 for our model with dependence, L = —252, w = 0.01 for our complicated model neglecting dependence, L = —268, 
w = 8 • 10"'^ for our simplified model neglecting dependence and L = —258, ui = 3 ■ 10 ® for the model in 1421 ). 
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Figure S4. Comparison between model including dependencies and stickleback choices in symmetric set- 
up. {A) Schematic diagram of symmetric set-up with a group of sticklebacks (in black) choosing between two identical 
rcfugia and with different numbers of replica fish (in red) going to x and y. (B) Experimentally measured statistics of final 
configurations of fish choices from 20 experimental repetitions 1421 (blue histogram) and results from the model that takes 
into account dependencies (red line using a'^ = 4.9; red region: 95% confidence interval; green line with = 5). Different 
graphs correspond to different stickleback group sizes and different number of replicas going to x and y. 
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Figure S5. Comparison between model including dependencies and stickleback choices in asymmetric set- 
up. A) Schematic diagram of asymmetric set-up (predator at y, large fisli depicted in red) with a group of sticklebacks (in 
black) choosing between two refugia, and replica fish (small fish depicted in red) going to y. (B) Experimentally measured 
statistics of final configurations of fish choices from 20 experimental repetitions 1421 (blue histogram) and results from the 
model that takes into account the dependencies (red line using a'^ = 8.7, a = 9.9; red region: 95% confidence interval. 
Green line using = 5 and a = 6.28). Different graphs correspond to different stickleback group sizes and different 
number of replicas going to y. 
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Figure S6. Goodness of fit of tiie model including dependencies for different values of a^. Red: Symmetric 
case (data in Fig. IS4II . Green: Case with different replicas at each side (data in Fig. [9] The parameters replicas 
re-optimized for each value of a'^). Blue: Asymmetric set-up with predator on one side (data in Fig. IS5I Parameter a is 
re-optimized for each value of a^). (^4) Root mean squared error between the data and the probabilities predicted by the 
model. Grey dashed line shows the moan RMSE for the three cases. The absolute values for each case depend on the shape 
of the data and are not comparable, only the trends and the position of the minima should be compared. (B) Logarithm 
of the probability that the data come from the model. The height of each curve depends on the number of data for each 
experiment, only the trend and the position of the maxima should be compared. Grey dashed line shows the sum of the 
three coloured lines, but shifted by 1000 so that it fits on the scale. The peak of this global probability indicates the value 
of a'-^ that best fits the three datasets (a'^ = 5). 
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Supporting text: Model for more 
than 2 options 

We present a derivation of the model for the more 
general case of M different options (instead of the 
2 options used in the main text). We also discuss 
some particular cases that give simple expressions 
while still widely applicable. 

Model for M options 

Let M be the number of possible options, ym, w = 
1 . . .M. Each individual estimates the probability 
that each option is the best one, using its non-social 
information (C) and the behavior of the other in- 
dividuals (B). So for one given option, say y^, we 
want to compute 



(SI) 



where stands for 'y^ is the best option'. We 
can compute the probability in Eg. lSll using Baycs' 
theorem, 



P{B\Y^,C)P{Y^\C) 
T!l^=iP{B\Yra.C)P{Y^\C) 



(S2) 



Dividing numerator and denominator by the numer- 
ator, we get 



where 



P{Y,\C,B)= ^ , 



_ p(y„jc) 



contains only non-social information, and 



P{B\Ym,C) 
PiB\Y^.C) 



(S3) 



(S4) 



(S5) 



contains the social information. Note that each 
term of the summation preserves the multiplicative 
relation between social and non-social information 
that was also apparent in Eq. 3 of the main text. 
There may be Af — 1 independent non-social param- 
eters amfi in the case that no two options have equal 
non-social information. But usually this will not be 
the case, and the number of independent non-social 
parameters will be lower. 



Now we assume independence among behaviors 
(Eq. 6 in main text), and group all possible be- 
haviors in L classes, {(ik}k=i (Eq. 7 in main text). 
These two assumptions transform Eq. IS5I into 



5. 



n 



k=l 



(S6) 



(S7) 



where is the number of individuals performing 
behavior /3fe, and 

_ PiPk\Ym,C) 

are the reliability parameters for behavior Pk with 
respect to options and y^. There may be up 
to L{M — 1) independent reliability parameters but 
usually they will not be all independent. 

In summary, from Equations IS3I and IS6I we have 
that 

P{Y,\C,B)^{Y,a^,X{sl'^\ . (S8) 



k=l 



This equation summarizes the general model ap- 
plicable to any kind of experiment. In the follow- 
ing sections we consider two particular cases with a 
much simpler expression. 

One basic reliability parameter 



The general model in Eg. IS8I depends in general on 
L{M — 1) independent reliability parameters Sk^m^- 
Here we derive the model for a particular case in 
which there is only one reliability parameter, s. 

First, we consider classes of behaviors (from now 
on we call them just 'behaviors') that simply con- 
sist of choosing a given option. If for example the 
options are different places, behaviors would be go- 
ing to each of those places. Therefore, the number 
of possible behaviors is the same as the number of 
options, L = AI. We use the convention that /3j is 
'choosing option yj\ Note that when a behavior is 
not informative (i.e. its reliability parameter is 1) 
it has no impact on the model in Eq. IS81 There- 
fore, considering this set of behaviors is equivalent 
to assuming that all other behaviors have reliability 
parameter equal to 1. 

We further assume that P{l3k\Ym,C) only de- 
pends on whether k = m or k ^ m, so that 

P{Pk\Yk,C) = P{Pi\Yi,C) 



Pi(3k\Ym,C)=Pi(3i\Yp,C), k^m,l^p 



(S9) 
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Note that P{l3k\Yk,C) is the probabihty that an- 
other individual makes the correct choice, and 
P{Pk\yrmC)with k ^ m \s the probability that it 
makes a wrong choice. So this assumption means 
that the probability of making the correct choice is 
the same regardless of which option is actually the 
correct one. In the case of symmetric choices, in 
which non-social information C is the same for all 
options, this relation will hold automatically, not 
being an extra assumption. It is likely that it also 
holds for many asymmetric choices. For example, 
the results for the asymmetric set-up presented in 
the main text suggest that it holds in that case. We 
define 

p/ = P(/3fe|y™,C), k^m. 

As it only matters whether the behavior matches 
the correct choice or not, there are only four distinct 
types of reliability parameters Sk,m^i (Eq. [S7|: 



(SIO) 



Sk,kk 



^k.kr, 



^k.mk 



PWk\Yk,C) 
PiPk\Yk,C) p, 

P{Pk\Yrn,C) 

PWk\Yi,C) 
Pil3k\Yk,C) 

' PWk\Y^,C) 
PiPk\Y^,C) 

' P{f3k\Yk,C) 



= ^ = 1 



where 



Pf 

Pc 

: ~ S. 

Pf 

Pl^l 

Pc S 
Pc 

Pf 



k ^ m, k ^ I 
k ^ m 
k ^ m, 

(Sll) 
(S12) 



is the basic reliability parameter, equal to the prob- 
ability that another individual makes the correct 
choice over the probability that it makes a mistake, 
for any behavior and for any individual. We regroup 
the terms inEq. IS8I so that it reflects the different 
types of Sk,mi^, (Eq.|Sllj), and get 



-1 



M L 

fe=l / 



, m— 1 



Using the relations in Eq. ISllI we have that 

P(y^|C,i3)= ('^a,„^s-(""-"'")') . (S14) 

\m=l / 



'-5 



Figure S7. Probability of choosing one of the options for 
the 3-choiee symmetric case. 



Note that the term m = fi is always equal to 1, so 
Eq. IS14I is identical to 



PiY^\C,B) = 1+ ^ 0™^^"*"""""' > (S15) 



rn—l 



that has the same structure as the equations pre- 
sented in the main text. 

Symmetric case 

In the special case that all options are indistinguish- 
able using non-social information alone (symmetric 
case), all non-social parameters are equal 1 and 
Eq. IS15I becomes 



M 



PiY,\C,B)=[l+J2 



,-(n,i-"m) 



(S16) 



m—1 



We recall that in this case Eq. IS9I holds automati- 
cally, not being an extra assumption. 

In the particular case of 3 options, x, y, z, we 
have 



P{X\C, B) = (l + .s-("--"«) + 



(S17) 

and the corresponding expressions for P(Y\C, B) 
and PiZ\C,B). Fig. [STlshows P{X\C,B) in terms 
of its two effective variables, — riy and — 
(Eq.IMli. 
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